Large-Scale Privacy-Preserving Mapping of Human Genomic Sequences on Hybrid Clouds
نویسندگان
چکیده
An operation preceding most human DNA analyses is read mapping, which aligns millions of short sequences (called reads) to a reference genome. This step involves an enormous amount of computation (evaluating edit distances for millions upon billions of sequence pairs) and thus needs to be outsourced to low-cost commercial clouds. This asks for scalable techniques to protect sensitive DNA information, a demand that cannot be met by any existing techniques (e.g., homomorphic encryption, secure multiparty computation). In this paper, we report a new step towards secure and scalable read mapping on the hybrid cloud, which includes both the public commercial cloud and the private cloud within an organization. Inspired by the famous “seed-and-extend” method, our approach strategically splits a mapping task: the public cloud seeks exact matches between the keyed hash values of short read substrings (called seeds) and those of reference sequences to roughly position reads on the genome; the private cloud extends the seeds from these positions to find right alignments. Our novel seed-combination technique further moves most workload of this task to the public cloud. The new approach is found to work effectively against known inference attacks, and also easily scale to millions of reads.
منابع مشابه
Attribute-based Access Control for Cloud-based Electronic Health Record (EHR) Systems
Electronic health record (EHR) system facilitates integrating patients' medical information and improves service productivity. However, user access to patient data in a privacy-preserving manner is still challenging problem. Many studies concerned with security and privacy in EHR systems. Rezaeibagha and Mu [1] have proposed a hybrid architecture for privacy-preserving accessing patient records...
متن کاملA hybrid cloud read aligner based on MinHash and kmer voting that preserves privacy
Low-cost clouds can alleviate the compute and storage burden of the genome sequencing data explosion. However, moving personal genome data analysis to the cloud can raise serious privacy concerns. Here, we devise a method named Balaur, a privacy preserving read mapper for hybrid clouds based on locality sensitive hashing and kmer voting. Balaur can securely outsource a substantial fraction of t...
متن کامل3D Point Cloud Encryption Through Chaotic Mapping
Three dimensional (3D) contents such as 3D point clouds, 3D meshes and 3D surface models are increasingly growing and being widely spread into the industry and our daily life. However, less people consider the problem of the privacy preserving of 3D contents. As an attempt towards 3D security, in this papers, we propose methods of encrypting the 3D point clouds through chaotic mapping. 2 scheme...
متن کاملSecurity and Privacy Aspects in MapReduce on Clouds: A Survey
MapReduce is a programming system for distributed processing large-scale data in an efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is extensively used daily around the world as an efficient distributed computation tool for a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern ma...
متن کاملSAFETY: Secure gwAs in Federated Environment Through a hYbrid solution with Intel SGX and Homomorphic Encryption
Recent studies demonstrate that effective healthcare can benefit from using the human genomic information. For instance, analysis of tumor genomes has revealed 140 genes whose mutations contribute to cancer 1. As a result, many institutions are using statistical analysis of genomic data, which are mostly based on genome-wide association studies (GWAS). GWAS analyze genome sequence variations in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012